{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import cntk\n",
    "import numpy as np\n",
    "import scipy.sparse\n",
    "import cntk.tests.test_utils\n",
    "cntk.tests.test_utils.set_device_from_pytest_env() # (only needed for our build system)\n",
    "cntk.cntk_py.set_fixed_random_seed(1) # fix the random seed so that LR examples are repeatable\n",
    "from IPython.display import Image\n",
    "import matplotlib.pyplot\n",
    "%matplotlib inline\n",
    "matplotlib.pyplot.rcParams['figure.figsize'] = (40,40)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# CNTK: A Guided Tour\n",
    "\n",
    "This tutorial exposes many advanced features of CNTK and is aimed towards people who have had some previous exposure to deep learning and/or other deep learning toolkits. If you are a complete beginner we suggest you start with the CNTK 101 Tutorial and come here after you have covered most of the 100 series.\n",
    "\n",
    "Welcome to CNTK and the wonders of deep learning! Deep neural networks are redefining how computer programs\n",
    "are created. In addition to imperative, functional, declarative programming, we now have differentiable programming which effectively 'learns'\n",
    "programs from data.\n",
    "With CNTK, you can be part of this revolution.\n",
    "\n",
    "CNTK is the prime tool that Microsoft product groups use to create deep models for a whole range of products,\n",
    "from speech recognition and machine translation via various image-classification services\n",
    "to Bing search ranking.\n",
    "\n",
    "This tutorial is a guided tour of CNTK. It is primarily meant for users that are new to CNTK but have some experience with deep neural networks.\n",
    "The focus will be on how the basic steps of deep learning are done in CNTK,\n",
    "which we will show predominantly by example.\n",
    "This tour is not a complete API description. Instead, we refer the reader to the documentation\n",
    "and task-specific tutorials for more detailed information.\n",
    "\n",
    "To train a deep model, you will need to define your model structure, prepare your data so that it can be fed to CNTK, train the model and evaluate its accuracy, and deploy it.\n",
    "\n",
    "This guided tour is organized as follows:\n",
    "\n",
    " * Defining your **model structure**\n",
    "    * The CNTK programming model: Networks as Function Objects\n",
    "    * CNTK's Data Model: Tensors and Sequences of Tensors\n",
    "    * Your First CNTK Network: Logistic Regression\n",
    "    * Your second CNTK Network: MNIST Digit Recognition\n",
    "    * The Graph API: MNIST Digit Recognition Once More\n",
    " * Feeding your **data**\n",
    "    * Small data sets that fit into memory: numpy/scipy arrays/\n",
    "    * Large data sets: `MinibatchSource` class\n",
    "    * Spoon-feeding data: your own minibatch loop\n",
    " * **Training**\n",
    "    * Distributed Training\n",
    "    * Logging\n",
    "    * Checkpointing\n",
    "    * Cross-validation based training control\n",
    "    * Final evaluation\n",
    " * **Deploying** the model\n",
    "    * From Python\n",
    "    * From C++ and C#\n",
    "    * From your own web service\n",
    "    * Via an Azure web service\n",
    " * Conclusion\n",
    " \n",
    "\n",
    "To run this tutorial, you will need CNTK v2 and ideally a CUDA-capable GPU (deep learning is no fun without GPUs).\n",
    "\n",
    "# Defining Your Model Structure\n",
    "\n",
    "So let us dive right in. Below we will introduce CNTK's programming model--*networks are function objects* and CNTK's data model. We will put that into action for logistic regression and MNIST digit recognition,\n",
    "using CNTK's Functional API.\n",
    "Lastly, CNTK also has a lower-level, TensorFlow/Theano-like graph API. We will replicate one example with it.\n",
    "\n",
    "### The CNTK Programming Model: Networks are Function Objects\n",
    "\n",
    "In CNTK, a neural network is a function object.\n",
    "On one hand, a neural network in CNTK is just a function that you can call\n",
    "to apply it to data.\n",
    "On the other hand, a neural network contains learnable parameters\n",
    "that can be accessed like object members.\n",
    "Complicated function objects can be composed as hierarchies of simpler ones, which,\n",
    "for example, represent layers.\n",
    "The function-object approach is similar to Keras, Chainer, Dynet, Pytorch,\n",
    "and the recent Sonnet.\n",
    "\n",
    "The following illustrates the function-object approach with pseudo-code, using the example\n",
    "of a fully-connected layer (called `Dense` in CNTK)::\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "W = [[ 0.0375642   0.07627806  0.03758281 -0.03862683 -0.04887058]\n",
      " [ 0.00286547  0.03796192 -0.02116286 -0.02630954 -0.04653097]]\n",
      "y = [ 0.0432681   0.15103742 -0.00474287 -0.09099352 -0.14098707]\n"
     ]
    }
   ],
   "source": [
    "# numpy *pseudo-code* for CNTK Dense layer (simplified, e.g. no back-prop)\n",
    "def Dense(out_dim, activation):\n",
    "    # create the learnable parameters\n",
    "    b = np.zeros(out_dim)\n",
    "    W = np.ndarray((0,out_dim)) # input dimension is unknown\n",
    "    # define the function itself\n",
    "    def dense(x):\n",
    "        if len(W) == 0:         # first call: reshape and initialize W\n",
    "            W.resize((x.shape[-1], W.shape[-1]), refcheck=False)\n",
    "            W[:] = np.random.randn(*W.shape) * 0.05\n",
    "        return activation(x.dot(W) + b)\n",
    "    # return as function object: can be called & holds parameters as members\n",
    "    dense.W = W\n",
    "    dense.b = b\n",
    "    return dense\n",
    "\n",
    "d = Dense(5, np.tanh)    # create the function object\n",
    "y = d(np.array([1, 2]))  # apply it like a function\n",
    "W = d.W                  # access member like an object\n",
    "print('W =', d.W)\n",
    "print('y =', y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, this is only pseudo-code. In reality, CNTK function objects are not actual Python lambdas.\n",
    "Rather, they are represented internally as graph structures in C++ that encode the formula,\n",
    "similar to TensorFlow and Theano.\n",
    "This graph structure is wrapped in the Python class `Function` that exposes `__call__()` and `__getattr__()` methods.\n",
    "\n",
    "The function object is CNTK's single abstraction used to represent different levels of neural networks, which\n",
    "are only distinguished by convention:\n",
    "\n",
    " * **basic operations** without learnable parameters (e.g. `times()`, `__add__()`, `sigmoid()`...)\n",
    " * **layers** (`Dense()`, `Embedding()`, `Convolution()`...). Layers map one input to one output.\n",
    " * **recurrent step functions** (`LSTM()`, `GRU()`, `RNNStep()`). Step functions map a previous state and a new input to a new state.\n",
    " * **loss and metric** functions (`cross_entropy_with_softmax()`, `binary_cross_entropy()`, `squared_error()`, `classification_error()`...).\n",
    "   In CNTK, losses and metric are not special, just functions.\n",
    " * **models**. Models are defined by the user and map features to predictions or scores, and is what gets deployed in the end.\n",
    " * **criterion function**. The criterion function maps (features, labels) to (loss, metric).\n",
    "   The Trainer optimizes the loss by SGD, and logs the metric, which may be non-differentiable.\n",
    "\n",
    "Higher-order layers compose objects into more complex ones, including:\n",
    "\n",
    " * layer **stacking** (`Sequential()`, `For()`)\n",
    " * **recurrence** (`Recurrence()`, `Fold()`, `UnfoldFrom()`, ...)\n",
    "\n",
    "Networks are commonly defined by using existing CNTK functions (such as\n",
    "specific types of neural-network layers)\n",
    "and composing them using `Sequential()`.\n",
    "In addition, users can write their own functions\n",
    "as arbitrary Python expressions, as long as those consist of CNTK operations\n",
    "over CNTK data types.\n",
    "Python expressions get converted into the internal representation by wrapping them in a call to\n",
    "`Function()`. This is similar to Keras' `Lambda()`.\n",
    "Expressions can be written as multi-line functions through decorator syntax (`@Function`).\n",
    "\n",
    "Lastly, function objects enable parameter sharing. If you call the same\n",
    "function object at multiple places, all invocations will naturally share the same learnable parameters.\n",
    "\n",
    "In summary, the function object is CNTK's single abstraction for conveniently defining\n",
    "simple and complex models, parameter sharing, and training objectives.\n",
    "\n",
    "(Note that it is possible to define CNTK networks directly in terms of\n",
    "its underlying graph representation similar to TensorFlow and Theano. This is discussed\n",
    "further below.)\n",
    "\n",
    "### CNTK's Data model: Sequences of Tensors\n",
    "\n",
    "CNTK can operate on two types of data:\n",
    "\n",
    " * **tensors** (that is, N-dimensional arrays), dense or sparse\n",
    " * **sequences** of tensors\n",
    "\n",
    "The distinction is that the shape of a tensor is static during operation,\n",
    "while the length of a sequence depends on data.\n",
    "Tensors have *static axes*, while a sequence has an additional *dynamic axis*.\n",
    "\n",
    "In CNTK, categorical data is represented as sparse one-hot tensors, not as integer vectors.\n",
    "This allows to write embeddings and loss functions in a unified fashion as matrix products.\n",
    "\n",
    "CNTK adopts Python's type-annotation syntax to declare CNTK types (works with Python 2.7).\n",
    "For example,\n",
    "\n",
    " * `Tensor[(13,42)]` denotes a tensor with 13 rows and 42 columns, and\n",
    " * `Sequence[SparseTensor[300000]]` a sequence of sparse vectors, which for example could represent a word out of a 300k dictionary\n",
    "\n",
    "Note the absence of a batch dimension. CNTK hides batching from the user.\n",
    "We want users to think in tensors and sequences, and leave mini-batching to CNTK.\n",
    "Unlike other toolkits, CNTK can also automatically batch *sequences with different lengths*\n",
    "into one minibatch, and handles all necessary padding and packing.\n",
    "Workarounds like 'bucketing' are not needed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Your First CNTK Network: Simple Logistic Regression\n",
    "\n",
    "Let us put all of this in action for a very simple example of logistic regression.\n",
    "For this example, we create a synthetic data set of 2-dimensional normal-distributed \n",
    "data points, which should be classified into belonging to one of two classes.\n",
    "Note that CNTK expects the labels as one-hot encoded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data =\n",
      " [[ 2.2741797   3.56347561]\n",
      " [ 5.12873602  5.79089499]\n",
      " [ 1.3574543   5.5718112 ]\n",
      " [ 3.54340553  2.46254587]]\n",
      "labels =\n",
      " [[ 1.  0.]\n",
      " [ 0.  1.]\n",
      " [ 0.  1.]\n",
      " [ 1.  0.]]\n"
     ]
    }
   ],
   "source": [
    "input_dim_lr = 2    # classify 2-dimensional data\n",
    "num_classes_lr = 2  # into one of two classes\n",
    "\n",
    "# This example uses synthetic data from normal distributions,\n",
    "# which we generate in the following.\n",
    "#  X_lr[corpus_size,input_dim] - input data\n",
    "#  Y_lr[corpus_size]           - labels (0 or 1), one-hot-encoded\n",
    "np.random.seed(0)\n",
    "def generate_synthetic_data(N):\n",
    "    Y = np.random.randint(size=N, low=0, high=num_classes_lr)  # labels\n",
    "    X = (np.random.randn(N, input_dim_lr)+3) * (Y[:,None]+1)   # data\n",
    "    # Our model expects float32 features, and cross-entropy\n",
    "    # expects one-hot encoded labels.\n",
    "    Y = scipy.sparse.csr_matrix((np.ones(N,np.float32), (range(N), Y)), shape=(N, num_classes_lr))\n",
    "    X = X.astype(np.float32)\n",
    "    return X, Y\n",
    "X_train_lr, Y_train_lr = generate_synthetic_data(20000)\n",
    "X_test_lr,  Y_test_lr  = generate_synthetic_data(1024)\n",
    "print('data =\\n', X_train_lr[:4])\n",
    "print('labels =\\n', Y_train_lr[:4].todense())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now define the model function. The model function maps input data to predictions.\n",
    "It is the final product of the training process.\n",
    "In this example, we use the simplest of all models: logistic regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model_lr = cntk.layers.Dense(num_classes_lr, activation=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we define the criterion function. The criterion function is\n",
    "the harness via which the trainer uses to optimize the model:\n",
    "It maps (input vectors, labels) to (loss, metric).\n",
    "The loss is used for the SGD updates. We choose cross entropy.\n",
    "Specifically, `cross_entropy_with_softmax()` first applies\n",
    "the `softmax()` function to the network's output, as\n",
    "cross entropy expects probabilities.\n",
    "We do not include `softmax()` in the model function itself, because\n",
    "it is not necessary for using the model.\n",
    "As the metric, we count classification errors (this metric is not differentiable).\n",
    "\n",
    "We define criterion function as Python code and convert it to a `Function` object.\n",
    "A single expression can be written as `Function(lambda x, y: `*expression of x and y*`)`,\n",
    "similar to Keras' `Lambda()`.\n",
    "To avoid evaluating the model twice, we use a Python function definition\n",
    "with decorator syntax. This is also a good time to tell CNTK about the\n",
    "data types of our inputs, which is done via the decorator `@Function.with_signature(`*argument types*`)`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "criterion_lr: Composite(data: Tensor[2], label_one_hot: SparseTensor[2]) -> Tuple[Tensor[1], Tensor[1]]\n",
      "W = [[-0.89681542 -0.89061725]\n",
      " [-0.11949861 -1.17324626]]\n"
     ]
    }
   ],
   "source": [
    "@cntk.Function.with_signature(cntk.layers.Tensor[input_dim_lr], cntk.layers.SparseTensor[num_classes_lr])\n",
    "def criterion_lr(data, label_one_hot):\n",
    "    z = model_lr(data)  # apply model. Computes a non-normalized log probability for every output class.\n",
    "    loss = cntk.cross_entropy_with_softmax(z, label_one_hot) # applies softmax to z under the hood\n",
    "    metric = cntk.classification_error(z, label_one_hot)\n",
    "    return loss, metric\n",
    "print('criterion_lr:', criterion_lr)\n",
    "print('W =', model_lr.W.value) # W now has known shape and thus gets initialized"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The decorator will 'compile' the Python function into CNTK's internal graph representation.\n",
    "Thus, the resulting `criterion` not a Python function but a CNTK `Function` object.\n",
    "\n",
    "We are now ready to train our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Learning rate per minibatch: 0.1\n",
      " Minibatch[   1-  50]: loss = 0.663274 * 1600, metric = 37.31% * 1600;\n",
      " Minibatch[  51- 100]: loss = 0.481867 * 1600, metric = 20.56% * 1600;\n",
      " Minibatch[ 101- 150]: loss = 0.402196 * 1600, metric = 12.94% * 1600;\n",
      " Minibatch[ 151- 200]: loss = 0.386619 * 1600, metric = 13.75% * 1600;\n",
      " Minibatch[ 201- 250]: loss = 0.328646 * 1600, metric = 9.19% * 1600;\n",
      " Minibatch[ 251- 300]: loss = 0.301831 * 1600, metric = 9.50% * 1600;\n",
      " Minibatch[ 301- 350]: loss = 0.299345 * 1600, metric = 9.44% * 1600;\n",
      " Minibatch[ 351- 400]: loss = 0.279577 * 1600, metric = 8.94% * 1600;\n",
      " Minibatch[ 401- 450]: loss = 0.281061 * 1600, metric = 8.25% * 1600;\n",
      " Minibatch[ 451- 500]: loss = 0.261366 * 1600, metric = 7.81% * 1600;\n",
      " Minibatch[ 501- 550]: loss = 0.244967 * 1600, metric = 7.12% * 1600;\n",
      " Minibatch[ 551- 600]: loss = 0.243953 * 1600, metric = 8.31% * 1600;\n",
      "Finished Epoch[1]: loss = 0.344399 * 20000, metric = 12.58% * 20000 2.137s (9358.9 samples/s);\n",
      "[[-1.25055134 -0.53687745]\n",
      " [-0.99188197 -0.30085728]]\n"
     ]
    }
   ],
   "source": [
    "learner = cntk.sgd(model_lr.parameters,\n",
    "                   cntk.learning_parameter_schedule(0.1))\n",
    "progress_writer = cntk.logging.ProgressPrinter(50)\n",
    "\n",
    "criterion_lr.train((X_train_lr, Y_train_lr), parameter_learners=[learner],\n",
    "                   callbacks=[progress_writer])\n",
    "\n",
    "print(model_lr.W.value) # peek at updated W"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "The `learner` is the object that actually performs the model update. Alternative learners include `momentum_sgd()` and `adam()`. The `progress_writer` is a stock logging callback that prints the output you see above, and can be replaced by your own\n",
    "or the stock `TensorBoardProgressWriter`to visualize training progress using TensorBoard.\n",
    "\n",
    "The `train()` function is feeding our data `(X_train_lr, Y_train_lr)` minibatch by minibatch to the model and updates it, where the data is a tuple in the same order as the arguments of `criterion_mn()`.\n",
    "\n",
    "Let us test how we are doing on our test set (this will also run minibatch by minibatch)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finished Evaluation [1]: Minibatch[1-32]: metric = 8.11% * 1024;\n"
     ]
    }
   ],
   "source": [
    "test_metric_lr = criterion_lr.test((X_test_lr, Y_test_lr),\n",
    "                                   callbacks=[progress_writer]).metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "And lastly, let us run a few samples through our model and see how it is doing.\n",
    "Oops, `criterion` knew the input types, but `model_lr` does not,\n",
    "so we tell it using `update_signature()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model_lr: Dense(x: Tensor[2]) -> Tensor[2]\n"
     ]
    }
   ],
   "source": [
    "model_lr.update_signature(cntk.layers.Tensor[input_dim_lr])\n",
    "print('model_lr:', model_lr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can call it like any Python function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Label    : [0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1]\n",
      "Predicted: [0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "z = model_lr(X_test_lr[:20])\n",
    "print(\"Label    :\", [label.todense().argmax() for label in Y_test_lr[:20]])\n",
    "print(\"Predicted:\", [z[i,:].argmax() for i in range(len(z))])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Your Second CNTK Network: MNIST Digit Recognition\n",
    "\n",
    "Let us do the same thing as above on an actual task--the MNIST benchmark, which is sort of the \"hello world\" of deep learning.\n",
    "The MNIST task is to recognize scans of hand-written digits. We first download and prepare the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAATUAAAA+CAYAAABKr4xzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEKFJREFUeJztnXlMVNcXx++dMAOTEQJkwDGgGUh/\nilUsEjRDFJTGGozSFiIqaaBCNNCo6BDUGFqxoUIrdUmIEVRcSqQu1YIUDF0S0QQbihS1EqGLuIwo\nggsIFhe+vz/MvDIyMNsbLa/nk7zUMsP3vPfm8b3nnnsucACMIAhCKshe9wkQBEGICZkaQRCSgkyN\nIAhJQaZGEISkIFMjCEJSkKkRBCEpyNQIgpAUZGoEQUgKMjWCICQFmRpBEJLC5RXHoz1ZBEHYC7fm\nTZSpDaC7u5stWLCAcc4Z55zFxMSwR48eve7TEoXq6moWFRXFcnNz2ZMnT1736YxoGhsbmYeHBxs7\nduzrPhXCDGRqA8jMzGRVVVWCqVVWVrJ79+69svhr165lnHO2adMmUXV/+OEHlpeXx2pqalhWVhbL\nysoSVf+/Rk1NDXv06BGLiYkRVbe5uZnFxcWxwsJC1t/fL6r2v4nExETGOWcNDQ1O0f/XmtrRo0eZ\nRqNhKSkprKqq6pXEbG1tNfn/6Oho5uXl9UpiX79+nR05coRxzkWN+fvvv7PY2Fh25swZxhhjU6dO\nZW+++aZo+v81Ojs72a5duxhjjH3yySeiahsMBlZVVcVWrFjB0tLSWF9fn6j6lvjuu+9YfHy8MKgP\nPLRaLWtpaXE4xmeffcbKysqYr68vU6vVIpy1GQC8ysNq9u/fD5VKBfaiDoc5c+bg1q1btkhYTX19\nPVavXg0XFxdwzsE5h0qlwjfffOOUeC/T29uLsLAwcM4RHR2Nrq4uUXQ7OzuxYMECcM6hUChQWFiI\njo4OPH/+XBR9Szx79gx79uxBeno6tmzZgr6+vlcS15kUFBSAc47CwkKn6H/++efQaDSQyWRISkpC\nZ2enU+K8zNatWxEYGCg8/+aOPXv2OBRDr9eDMQbOOZqamuyRsMpn/rWmZmTlypVwd3cHYwwKhcIe\nCYtkZmYO+gBDQkKcEsscS5cuBWMM3t7euH79umi6H374oXA9GzZsEE3XGh4/fowlS5aY3NP09HSH\nBiaDwYCTJ09i/vz5wmBnPGbOnImysjIRr2AwJSUlcHV1xdq1a0UbeMxRVlYGV1dXyGQy5ObmOi2O\nkcrKSkyePFn4nBQKBVavXo3MzEyT48SJE3bHaG9vR2JiIjjnWLhwob0y0jA1ALh27RqSk5Ph4uKC\nqqoqe2XMcufOHUyfPt3kh2/Tpk34448/RI0zFMbRy9vbG+fPnxdNt6ysDN7e3uCcw83NDceOHRNN\n2xJtbW2IjY0F5xxarRZBQUHCvR03bhxycnLw5MkTq7RaW1tRWFiId955B0qlEowxeHp6Yvbs2Zg/\nfz4yMzMRFBQkmJszGT9+PMLDw50aw8iWLVugUCgQHBwMg8HgtDi1tbUYP3688PksXLgQd+/eFT2O\n8XmIjo52RF86pmYkKCgIGzdudFRG4OrVqwgNDR2Upd25c0e0GMOxZs0aKBQKcM5FNZ1z587By8tL\nMLS8vDzRtC1RUFCACRMmQCaTIScnBzdu3MDDhw+xe/duREZGIjIyEkqlEuvXr8ezZ8+G1bp58yb8\n/f3BGINarUZsbCzy8/MH/ZB3d3cjKioKjDFcvXrVKddVWloKxhiWLl3qFH1zeHh4gHMOvV7vtBgx\nMTHCcz979mw0NDQ4JQ7nHJMmTXJUhkzNEtu3bzcxM4VCAb1e/0pqP5WVlVAoFHBxccH+/ftF1T5w\n4IBwTRMnThRVeyhu3bqFvLw8uLq6Wpy+R0dHg3OOXbt2Dau5c+dOqNVqxMTEWJy2njhxAowxp2XY\nQUFBGDNmDHp7e52ibw7jDMLPz88p+iUlJXB3dwfnHMHBwTh58qRT4rS2tkKlUpmdvra2tqK+vh71\n9fVobW21JCUdU3v+/DlWrFgBlUqFxsZGe2UG4ebmZmJqzhwRB1JUVCQUTLOzs0XXN073tFotDh06\nJLq+OXQ6nXAfly1bhtu3bw/53o6ODqjVamRlZVnUvXnzJh49ejTse/r7+5GcnOy0TK2npwcajQZb\nt24VXXs4KioqnGZqJ0+ehEajET6z/Px80WMYSU1NRWJiosnXampqoNfr4evrC5lMBs45Ro8ejZyc\nnOGkRr6p3b9/H3v27EFAQAAYY9BoNLZKmOXBgwdISEgQbibnHP7+/qitrUV5eTnKy8tRUlICjUaD\n/fv3o7y8HJcvXxYldkNDg1D/ycjIwL1790TRNXL+/Hmo1WqnGaY52trahPu4detWdHR0DPv+5uZm\neHp6WmVq1nDp0iWn1dS6urqEelBzc7PJaxcvXkR8fDzi4+Nx6dIl0WPX1dU5xdSePn2KhIQE4TNT\nq9XIzs7G4cOH8eOPP4oaq729HYwxk9XOnJwcYVB/+b8+Pj64du3aUHIj09QaGhqwePFiZGRkICIi\nQnhYIyIi8Ouvv1ojYZHGxsZBdbT33nsP6enpQy5n/+9//0NGRoZD0w+DwYCoqChwzqHT6SxmIPZw\n4MAB4Z6J/YCaY9++ffD29sb48eOxd+9ei3UyACguLgbnHO7u7g7Hv3TpEjQajaiD3kBqa2vNDhDv\nvvsuNBoNfHx8oFarMWbMGNFj5+bmOsXUUlJSBpVdjLVdhUIBX19fnD59WpRYxlKDkaKiImi1WnDO\nIZPJEBcXh/Pnz+Pjjz8WkoxhFsxGpqlNmTJl0HK9XC4XtbfKnKlZe5SUlNgdd9++fUIP3MAM7d69\ne6LVag4ePCic608//TTsuRQUFKCgoAA3btywOU53dzdmzJgBuVwOzjlKS0ut/t76+nqhluMIFy5c\nMDE0Z9S7kpOTwTkX7pHBYMD06dMhl8uh1+tx9uxZrFy5EmFhYaLHNrYyJSQkiKZZV1eHqVOnmn22\np02bhkmTJoFzjqlTp6K0tBQ9PT0OxQsLCwNjDK2trYiMjBQysujoaJw5c0Z4X1ZWlvD1YRiZpmYw\nGJCTk4OPPvoIc+bMgUwmA2MM8+bNE22q5oip+fv72xWzvr4ebm5uUKlU+P777wEAR48exeLFi+Hj\n44Pg4GBRWjosmdqBAwcQEBBg0mjs4+ODU6dOWR2jv78faWlpYIxBJpNh2bJlNp1jcXExGGOYPXu2\nTd9n5M8//0RqaqrQ4hEcHCxaeWAgly9fBmMMISEh6OrqwrfffouEhAQolUpUVFQAeLGCrtFoUF5e\nLmrs06dPw83NDa6urqitrRVFs6OjAxERESbPs6urK5KSklBRUYGnT5/iypUrCA8PF14/fvy43fHa\n29uFrCw2NlbIxMz1qRkzt6KiouEkR6apvUxFRQVGjRoFxhgWLVpkj8QgLJmap6cnPD094eXlJWQU\nxkMul6O4uNjmmNnZ2WCMwcvLC7t27RqUjTLG8OWXXzp8bUOZWmNjo2BE5q558+bNVscw1tDkcrnN\ndbve3l7Ex8eDc25XTS0wMBAeHh4m923JkiWoqKjAX3/9ZbPecOj1enDO8dVXXwF40afm4eFhsopX\nW1uLoKAg3L9/X7S4Dx8+FAZ0zjnCw8Nx/PhxdHd3O6Tb0dGB5cuXQy6XY/To0UhOTsaDBw9M3tPX\n1yfsbnF1dXVoZgL8k6kZn7uwsDCTPrWBGZyvr68lOWmYGvDC2MQsBA9napGRkSYPT3V19aD3pKen\n2xwzKSnJRIMxBpVKJXRyM8bsMsuXuXv3rjA65ufno729HY8fP8b7778vxBk3bhzWr1+Puro6Ybph\nS+tHWloaOLdvtbiwsFAYOM6dO2fz93/66afQarUICQlBSEgItFqtkLG5u7tb3dRrDXq9HlOmTEFX\nV5fZPrWenh5hEcGeKfxQ5OXlQSaTCaZm/HdsbCxaWloc1q+srDT79b6+PuzcuVN4RidMmGBx0ccS\nRoM0XsdAQ2tqajLJ4Kqrqy3JScfUnj9/jg8++ACMMVHqJsOZ2vLly1FTUyMcb731lsnrbm5udjXn\nHjlyxKypKZVKcP6iDeLp06cOXxsA5OfnC3HmzJljsgCybt06k1U8Yy+ULe0KxlHXlv667u5u6PV6\neHp6YvLkyTh79qwtlzQsly9fxhdffAG5XI4dO3aIpjtr1ixhB0FQUBAWLFhg8vxt3LgRfn5+KCws\ntGqBxBqM2wJlMhm0Wi1OnTqFjIwMjBkzBjKZDCEhIYOyK7Ho6uoy2V3w9ttvO6wZHR0tJCRxcXEm\nrw1cBbXQymFEOqYGvEhT1Wq1KPv7HKmprVq1yq6YL2d8A6eBaWlpom7WNxgMmDFjhtnzb2pqQlNT\nEw4dOoTY2Fh4eXlBqVTatFJqi6k1Nzfj2LFj0Ol08Pb2RmpqqsPF56HIycmBn58f6uvrRdEbNWoU\ndDodAMDf318w/tu3b2P+/Pnw9/e3aYFkODo7O5GQkAAPDw/IZDIEBASYZGU///wz/Pz8BGOzNWOz\nZuq6bt064TlxcXGx1miGpaioyCRTM5KTk4NRo0aBc5t2GkjL1Do6OuDn5yfKCG+vqSUlJaGurs6u\nmNXV1SZ1IG9vbyxZskS0H8CXuXXrFjZv3mzSYPmymRqPuXPn2qRt1JgwYQK2bduGqqqqQcfMmTMR\nEREBjUYDT09PpKWlOaWXy9y5rVu3ThQtvV4PtVqN06dPY+zYsZg5cyaysrKEPkCxDA0Adu/eLUwz\nAwMDB/XEAS8WJTZs2IDRo0dbXQLp6enBmjVrEBoaOuz7SkpKhLYOzjlSU1Ptug5zDFz1ZIxh4sSJ\nQ9bYLDByTc1gMJi0cPT392PTpk1gjOHhw4fWygxJS0sLdDod3njjDavMzFgQ//vvv+2OaczU5s2b\nh9u3b4vedDsUV65cEbYuvWxq7u7uCA0NtbketH37dpNpirmDMYbQ0FAcPHgQV65ccdLVmWJcrRRj\nwQX4Z6FArVbDzc3N5N6J2dj89ddfC1POwMBAi/fr5s2baGtrs0o7OzsbnHOEhoaaXSEuKirC5MmT\nhTKI8drE3NRu3I44sEbIuV2b50euqc2dO1fY3tPZ2Ym4uDin/OqhlpYW7N27V2gQNHds2bJFWP1y\nBKOp1dTUiHDmtlFRUYHw8HDMmzcPubm5yMvLQ15enkNF4LKyMuzYsQMBAQHCvYqPj0dKSgpSUlJw\n8OBBp9V+huLw4cNQKpWimeiNGzdMOu/1ej0yMjLwyy+/iFZDA4BFixYJWZqYtUbgn/4vzl9sQ9Lr\n9dDr9UhKSjLJzDjn8PLyQnx8PPr7+0U9BwAmmZqvr6+9U9uRbWpyuRxKpVIYIVUqldN/X5YzuXjx\nIpRKJS5cuPC6T0VU7t+/j7a2NrS1tYm20GEv06ZNE2pgI4nS0lJERUWhuLhYVLMEXtTSCgoKTDIx\nc4dOp3PqPuHt27dj1qxZSEtLc6Qfc+Sa2rZt20x2FixcuNApzZWENOjv78eqVasgk8le6e+NG0ls\n3LjRrJmtXLkSLS0tePz48es+RWuwymc48Er/ah39iTxCVACwHTt2sIyMDMYYY729vUypVL7msyKc\nBP2JPELa/PbbbywxMVEwNJ1OR4ZGUKZGEMSIgTI1giD+e7i84nhWOS1BEIS9UKZGEISkIFMjCEJS\nkKkRBCEpyNQIgpAUZGoEQUgKMjWCICQFmRpBEJKCTI0gCElBpkYQhKQgUyMIQlKQqREEISnI1AiC\nkBRkagRBSAoyNYIgJAWZGkEQkoJMjSAISUGmRhCEpCBTIwhCUpCpEQQhKcjUCIKQFGRqBEFICjI1\ngiAkBZkaQRCS4v9FRqvLkHTWJAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x2443ba6bfd0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "input_shape_mn = (28, 28)  # MNIST digits are 28 x 28\n",
    "num_classes_mn = 10        # classify as one of 10 digits\n",
    "\n",
    "# Fetch the MNIST data. Best done with scikit-learn.\n",
    "try:\n",
    "    from sklearn import datasets, utils\n",
    "    mnist = datasets.fetch_mldata(\"MNIST original\")\n",
    "    X, Y = mnist.data / 255.0, mnist.target\n",
    "    X_train_mn, X_test_mn = X[:60000].reshape((-1,28,28)), X[60000:].reshape((-1,28,28))\n",
    "    Y_train_mn, Y_test_mn = Y[:60000].astype(int), Y[60000:].astype(int)\n",
    "except: # workaround if scikit-learn is not present\n",
    "    import requests, io, gzip\n",
    "    X_train_mn, X_test_mn = (np.fromstring(gzip.GzipFile(fileobj=io.BytesIO(requests.get('http://yann.lecun.com/exdb/mnist/' + name + '-images-idx3-ubyte.gz').content)).read()[16:], dtype=np.uint8).reshape((-1,28,28)).astype(np.float32) / 255.0 for name in ('train', 't10k'))\n",
    "    Y_train_mn, Y_test_mn = (np.fromstring(gzip.GzipFile(fileobj=io.BytesIO(requests.get('http://yann.lecun.com/exdb/mnist/' + name + '-labels-idx1-ubyte.gz').content)).read()[8:], dtype=np.uint8).astype(int) for name in ('train', 't10k'))\n",
    "\n",
    "# Shuffle the training data.\n",
    "np.random.seed(0) # always use the same reordering, for reproducability\n",
    "idx = np.random.permutation(len(X_train_mn))\n",
    "X_train_mn, Y_train_mn = X_train_mn[idx], Y_train_mn[idx]\n",
    "\n",
    "# Further split off a cross-validation set\n",
    "X_train_mn, X_cv_mn = X_train_mn[:54000], X_train_mn[54000:]\n",
    "Y_train_mn, Y_cv_mn = Y_train_mn[:54000], Y_train_mn[54000:]\n",
    "\n",
    "# Our model expects float32 features, and cross-entropy expects one-hot encoded labels.\n",
    "Y_train_mn, Y_cv_mn, Y_test_mn = (scipy.sparse.csr_matrix((np.ones(len(Y),np.float32), (range(len(Y)), Y)), shape=(len(Y), 10)) for Y in (Y_train_mn, Y_cv_mn, Y_test_mn))\n",
    "X_train_mn, X_cv_mn, X_test_mn = (X.astype(np.float32) for X in (X_train_mn, X_cv_mn, X_test_mn))\n",
    "\n",
    "# Have a peek.\n",
    "matplotlib.pyplot.rcParams['figure.figsize'] = (5, 0.5)\n",
    "matplotlib.pyplot.axis('off')\n",
    "_ = matplotlib.pyplot.imshow(np.concatenate(X_train_mn[0:10], axis=1), cmap=\"gray_r\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's define the CNTK model function to map (28x28)-dimensional images to a 10-dimensional score vector. We wrap that in a function so that later in this tutorial we can easily recreate it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def create_model_mn():\n",
    "    with cntk.layers.default_options(activation=cntk.ops.relu, pad=False):\n",
    "        return cntk.layers.Sequential([\n",
    "            cntk.layers.Convolution2D((5,5), num_filters=32, reduction_rank=0, pad=True), # reduction_rank=0 for B&W images\n",
    "            cntk.layers.MaxPooling((3,3), strides=(2,2)),\n",
    "            cntk.layers.Convolution2D((3,3), num_filters=48),\n",
    "            cntk.layers.MaxPooling((3,3), strides=(2,2)),\n",
    "            cntk.layers.Convolution2D((3,3), num_filters=64),\n",
    "            cntk.layers.Dense(96),\n",
    "            cntk.layers.Dropout(dropout_rate=0.5),\n",
    "            cntk.layers.Dense(num_classes_mn, activation=None) # no activation in final layer (softmax is done in criterion)\n",
    "        ])\n",
    "model_mn = create_model_mn()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This model is a tad bit more complicated! It consists of several convolution-pooling layeres and two\n",
    "fully-connected layers for classification which is typical for MNIST. This demonstrates several aspects of CNTK's Functional API.\n",
    "\n",
    "First, we create each layer using a function from CNTK's layers library (`cntk.layers`).\n",
    "\n",
    "Second, the higher-order layer `Sequential()` creates a new function that applies all those layers\n",
    "one after another. This is known [forward function composition](https://en.wikipedia.org/wiki/Function_composition).\n",
    "Note that unlike some other toolkits, you cannot `Add()` more layers afterwards to a sequential layer.\n",
    "CNTK's `Function` objects are immutable, besides their learnable parameters (to edit a `Function` object, you can `clone()` it).\n",
    "If you prefer that style, create your layers as a Python list and pass that to `Sequential()`.\n",
    "\n",
    "Third, the context manager `default_options()` allows to specify defaults for various optional arguments to layers,\n",
    "such as that the activation function is always `relu`, unless overriden.\n",
    "\n",
    "Lastly, note that `relu` is passed as the actual function, not a string.\n",
    "Any function can be an activation function.\n",
    "It is also allowed to pass a Python lambda directly, for example relu could also be\n",
    "realized manually by saying `activation=lambda x: cntk.ops.element_max(x, 0)`.\n",
    "\n",
    "The criterion function is defined like in the previous example, to map maps (28x28)-dimensional features and according\n",
    "labels to loss and metric."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "@cntk.Function.with_signature(cntk.layers.Tensor[input_shape_mn], cntk.layers.SparseTensor[num_classes_mn])\n",
    "def criterion_mn(data, label_one_hot):\n",
    "    z = model_mn(data)\n",
    "    loss = cntk.cross_entropy_with_softmax(z, label_one_hot)\n",
    "    metric = cntk.classification_error(z, label_one_hot)\n",
    "    return loss, metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the training, let us throw momentum into the mix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "N = len(X_train_mn)\n",
    "lrs = cntk.learning_parameter_schedule_per_sample([0.001]*12 + [0.0005]*6 + [0.00025]*6 + [0.000125]*3 + [0.0000625]*3 + [0.00003125], epoch_size=N)\n",
    "momentums = cntk.learners.momentum_schedule([0]*5 + [0.7788007830714049], epoch_size=N, minibatch_size=256)\n",
    "minibatch_sizes = cntk.minibatch_size_schedule([256]*6 + [512]*9 + [1024]*7 + [2048]*8 + [4096], epoch_size=N)\n",
    "\n",
    "learner = cntk.learners.momentum_sgd(model_mn.parameters, lrs, momentums)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This looks a bit unusual.\n",
    "First, the learning rate is specified as a list (`[0.001]*12 + [0.0005]*6 +`...). Together with the `epoch_size` parameter, this tells CNTK to use 0.001 for 12 epochs, and then continue with 0.005 for another 6, etc.\n",
    "\n",
    "Second, the learning rate is specified per-sample, and momentum is specified per 256 sampels \n",
    "(i.e. the reference minibatch size). These values specify directly the weight with which each\n",
    "sample's gradient contributes to the model, and how its contribution decays as training progresses;\n",
    "independent of the minibatch size, which is crucial for efficiency of GPUs and parallel training.\n",
    "This unique CNTK feature allows to adjust the minibatch size without retuning those parameters.\n",
    "Here, we grow it from 256 to 4096, leading to 3 times faster\n",
    "operation towards the end (on a Titan-X).\n",
    "\n",
    "Alright, let us now train the model. On a Titan-X, this will run for about a minute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Learning rate per 1 samples: 0.001\n",
      "Momentum per 256 samples: 0.0\n",
      "Finished Epoch[1]: loss = 0.708632 * 54000, metric = 23.72% * 54000 3.195s (16901.4 samples/s);\n",
      "Finished Epoch[2]: loss = 0.137199 * 54000, metric = 4.07% * 54000 1.770s (30508.5 samples/s);\n",
      "Finished Epoch[3]: loss = 0.088906 * 54000, metric = 2.57% * 54000 1.794s (30100.3 samples/s);\n",
      "Finished Epoch[4]: loss = 0.070951 * 54000, metric = 2.04% * 54000 1.583s (34112.4 samples/s);\n",
      "Finished Epoch[5]: loss = 0.061803 * 54000, metric = 1.86% * 54000 1.877s (28769.3 samples/s);\n",
      "Momentum per 256 samples: 0.7788007830714049\n",
      "Finished Epoch[6]: loss = 0.054887 * 54000, metric = 1.61% * 54000 1.802s (29966.7 samples/s);\n",
      "Finished Epoch[7]: loss = 0.046202 * 54000, metric = 1.37% * 54000 2.335s (23126.3 samples/s);\n",
      "Finished Epoch[8]: loss = 0.045087 * 54000, metric = 1.31% * 54000 1.305s (41379.3 samples/s);\n",
      "Finished Epoch[9]: loss = 0.040770 * 54000, metric = 1.23% * 54000 1.326s (40724.0 samples/s);\n",
      "Finished Epoch[10]: loss = 0.036437 * 54000, metric = 1.15% * 54000 1.314s (41095.9 samples/s);\n",
      "Finished Epoch[11]: loss = 0.034136 * 54000, metric = 1.03% * 54000 1.303s (41442.8 samples/s);\n",
      "Finished Epoch[12]: loss = 0.029288 * 54000, metric = 0.86% * 54000 1.281s (42154.6 samples/s);\n",
      "Learning rate per 1 samples: 0.0005\n",
      "Finished Epoch[13]: loss = 0.024868 * 54000, metric = 0.75% * 54000 1.293s (41763.3 samples/s);\n",
      "Finished Epoch[14]: loss = 0.022960 * 54000, metric = 0.65% * 54000 1.299s (41570.4 samples/s);\n",
      "Finished Epoch[15]: loss = 0.021247 * 54000, metric = 0.60% * 54000 1.294s (41731.1 samples/s);\n",
      "Finished Epoch[16]: loss = 0.020266 * 54000, metric = 0.61% * 54000 3.006s (17964.1 samples/s);\n",
      "Finished Epoch[17]: loss = 0.019458 * 54000, metric = 0.57% * 54000 1.052s (51330.8 samples/s);\n",
      "Finished Epoch[18]: loss = 0.018629 * 54000, metric = 0.57% * 54000 1.054s (51233.4 samples/s);\n",
      "Learning rate per 1 samples: 0.00025\n",
      "Finished Epoch[19]: loss = 0.016458 * 54000, metric = 0.50% * 54000 1.046s (51625.2 samples/s);\n",
      "Finished Epoch[20]: loss = 0.015672 * 54000, metric = 0.50% * 54000 1.037s (52073.3 samples/s);\n",
      "Finished Epoch[21]: loss = 0.013984 * 54000, metric = 0.42% * 54000 1.043s (51773.7 samples/s);\n",
      "Finished Epoch[22]: loss = 0.015272 * 54000, metric = 0.43% * 54000 1.054s (51233.4 samples/s);\n",
      "Finished Epoch[23]: loss = 0.014490 * 54000, metric = 0.40% * 54000 1.891s (28556.3 samples/s);\n",
      "Finished Epoch[24]: loss = 0.014596 * 54000, metric = 0.49% * 54000 0.940s (57446.8 samples/s);\n",
      "Learning rate per 1 samples: 0.000125\n",
      "Finished Epoch[25]: loss = 0.013999 * 54000, metric = 0.42% * 54000 0.950s (56842.1 samples/s);\n",
      "Finished Epoch[26]: loss = 0.012593 * 54000, metric = 0.40% * 54000 0.950s (56842.1 samples/s);\n",
      "Finished Epoch[27]: loss = 0.012298 * 54000, metric = 0.37% * 54000 0.947s (57022.2 samples/s);\n",
      "Learning rate per 1 samples: 6.25e-05\n",
      "Finished Epoch[28]: loss = 0.012227 * 54000, metric = 0.39% * 54000 0.955s (56544.5 samples/s);\n",
      "Finished Epoch[29]: loss = 0.012548 * 54000, metric = 0.34% * 54000 0.941s (57385.8 samples/s);\n",
      "Finished Epoch[30]: loss = 0.012083 * 54000, metric = 0.36% * 54000 0.937s (57630.7 samples/s);\n",
      "Learning rate per 1 samples: 3.125e-05\n",
      "Finished Epoch[31]: loss = 0.011832 * 54000, metric = 0.38% * 54000 3.316s (16284.7 samples/s);\n",
      "Finished Epoch[32]: loss = 0.011593 * 54000, metric = 0.34% * 54000 0.902s (59867.0 samples/s);\n",
      "Finished Epoch[33]: loss = 0.011058 * 54000, metric = 0.32% * 54000 0.897s (60200.7 samples/s);\n",
      "Finished Epoch[34]: loss = 0.011375 * 54000, metric = 0.36% * 54000 0.905s (59668.5 samples/s);\n",
      "Finished Epoch[35]: loss = 0.011668 * 54000, metric = 0.32% * 54000 0.898s (60133.6 samples/s);\n",
      "Finished Epoch[36]: loss = 0.011100 * 54000, metric = 0.32% * 54000 0.920s (58695.7 samples/s);\n",
      "Finished Epoch[37]: loss = 0.011169 * 54000, metric = 0.32% * 54000 0.911s (59275.5 samples/s);\n",
      "Finished Epoch[38]: loss = 0.011625 * 54000, metric = 0.33% * 54000 0.905s (59668.5 samples/s);\n",
      "Finished Epoch[39]: loss = 0.010790 * 54000, metric = 0.31% * 54000 0.913s (59145.7 samples/s);\n",
      "Finished Epoch[40]: loss = 0.011039 * 54000, metric = 0.32% * 54000 0.897s (60200.7 samples/s);\n",
      "Finished Evaluation [1]: Minibatch[1-313]: metric = 0.59% * 10000;\n"
     ]
    }
   ],
   "source": [
    "progress_writer = cntk.logging.ProgressPrinter()\n",
    "criterion_mn.train((X_train_mn, Y_train_mn), minibatch_size=minibatch_sizes,\n",
    "                   max_epochs=40, parameter_learners=[learner], callbacks=[progress_writer])\n",
    "test_metric_mn = criterion_mn.test((X_test_mn, Y_test_mn), callbacks=[progress_writer]).metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Graph API Example: MNIST Digit Recognition Again\n",
    "\n",
    "CNTK also allows networks to be written in graph style like TensorFlow and Theano. The following defines the same model and criterion function as above, and will get the same result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "criterion_mn: Composite(images: Tensor[28,28], labels: SparseTensor[10]) -> Tuple[Tensor[1], Tensor[1]]\n"
     ]
    }
   ],
   "source": [
    "images = cntk.input_variable(input_shape_mn, name='images')\n",
    "with cntk.layers.default_options(activation=cntk.ops.relu, pad=False):\n",
    "    r = cntk.layers.Convolution2D((5,5), num_filters=32, reduction_rank=0, pad=True)(images)\n",
    "    r = cntk.layers.MaxPooling((3,3), strides=(2,2))(r)\n",
    "    r = cntk.layers.Convolution2D((3,3), num_filters=48)(r)\n",
    "    r = cntk.layers.MaxPooling((3,3), strides=(2,2))(r)\n",
    "    r = cntk.layers.Convolution2D((3,3), num_filters=64)(r)\n",
    "    r = cntk.layers.Dense(96)(r)\n",
    "    r = cntk.layers.Dropout(dropout_rate=0.5)(r)\n",
    "    model_mn = cntk.layers.Dense(num_classes_mn, activation=None)(r)\n",
    "\n",
    "label_one_hot = cntk.input_variable(num_classes_mn, is_sparse=True, name='labels')\n",
    "loss = cntk.cross_entropy_with_softmax(model_mn, label_one_hot)\n",
    "metric = cntk.classification_error(model_mn, label_one_hot)\n",
    "criterion_mn = cntk.combine([loss, metric])\n",
    "print('criterion_mn:', criterion_mn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Feeding Your Data\n",
    "\n",
    "Once you have decided your model structure and defined it, you are facing the question on feeding\n",
    "your training data to the CNTK training process.\n",
    "\n",
    "The above examples simply feed the data as numpy/scipy arrays.\n",
    "That is only one of three ways CNTK provides for feeding data to the trainer:\n",
    "\n",
    " 1. As **numpy/scipy arrays**, for small data sets that can just be loaded into RAM.\n",
    " 2. Through instances of **CNTK's MinibatchSource class**, for large data sets that do not fit into RAM.\n",
    " 3. Through an **explicit minibatch-loop** when the above do not apply.\n",
    "\n",
    "### 1. Feeding Data Via Numpy/Scipy Arrays\n",
    "\n",
    "The `train()` and `test()` functions accept a tuple of numpy or scipy arrays for their `minibatch_source` arguments.\n",
    "The tuple members must be in the same order as the arguments of the `criterion` function that `train()` or `test()` are called on.\n",
    "For dense tensors, use numpy arrays, while sparse data should have the type `scipy.sparse.csr_matrix`.\n",
    "\n",
    "Each of the arguments should be a Python list of numpy/scipy arrays, where each list entry represents a data item. For arguments declared as `Sequence[...]`, the first axis of the numpy/scipy array is the sequence length, while the remaining axes are the shape of each token of the sequence. Arguments that are not sequences consist of a single tensor. The shapes, data types (`np.float32/float64`) and sparseness must match the argument types as declared in the criterion function.\n",
    "\n",
    "As an optimization, arguments that are not sequences can also be passed as a single large numpy/scipy array (instead of a list). This is what is done in the examples above.\n",
    "\n",
    "Note that it is the responsibility of the user to randomize the data.\n",
    "\n",
    "### 2. Feeding Data Using the `MinibatchSource` class for Reading Data\n",
    "\n",
    "Production-scale training data sometimes does not fit into RAM. For example, a typical speech corpus may be several hundred GB large. For this case, CNTK provides the `MinibatchSource` class, which provides:\n",
    "\n",
    " * A **chunked randomization algorithm** that holds only part of the data in RAM at any given time.\n",
    " * **Distributed reading** where each worker reads a different subset.\n",
    " * A **transformation pipeline** for images and image augmentation.\n",
    " * **Composability** across multiple data types (e.g. image captioning).\n",
    "\n",
    "At present, the `MinibatchSource` class implements a limited set of data types in the form of \"deserializers\":\n",
    "\n",
    " * **Images** (`ImageDeserializer`).\n",
    " * **Speech files** (`HTKFeatureDeserializer`, `HTKMLFDeserializer`).\n",
    " * Data in CNTK's **canonical text format (CTF)**, which encodes any of CNTK's data types in a human-readable text format.\n",
    "\n",
    "The following example of using the `ImageDeserializer` class shows the general pattern.\n",
    "For the specific input-file formats, please consult the documentation\n",
    "or data-type specific tutorials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "image_width, image_height, num_channels = (32, 32, 3)\n",
    "num_classes = 1000\n",
    "def create_image_reader(map_file, is_training):\n",
    "    transforms = []\n",
    "    if is_training:  # train uses data augmentation (translation only)\n",
    "        transforms += [\n",
    "            cntk.io.transforms.crop(crop_type='randomside', side_ratio=0.8)  # random translation+crop\n",
    "        ]\n",
    "    transforms += [  # to fixed size\n",
    "        cntk.io.transforms.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear'),\n",
    "    ]\n",
    "    # deserializer\n",
    "    return cntk.io.MinibatchSource(cntk.io.ImageDeserializer(map_file, cntk.io.StreamDefs(\n",
    "        features = cntk.io.StreamDef(field='image', transforms=transforms),\n",
    "        labels   = cntk.io.StreamDef(field='label', shape=num_classes)\n",
    "    )), randomize=is_training, max_sweeps = cntk.io.INFINITELY_REPEAT if is_training else 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.  Feeding Data Via an Explicit Minibatch Loop\n",
    "\n",
    "Instead of feeding your data as a whole to CNTK's `train()` and `test()` functions which implement a minibatch loop internally,\n",
    "you can realize your own minibatch loop and call the lower-level APIs `train_minibatch()` and `test_minibatch()`.\n",
    "This is useful when your data is not in a form suitable for the above, such as being generated on the fly as in variants of reinforcement learning. The `train_minibatch()` and `test_minibatch()` methods require you to instantiate an object of class `Trainer` that takes a subset of the arguments of `train()`. The following implements the logistic-regression example from above through explicit minibatch loops:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Learning rate per minibatch: 0.1\n",
      " Minibatch[   1-  50]: loss = 0.663274 * 1600, metric = 37.31% * 1600;\n",
      " Minibatch[  51- 100]: loss = 0.481867 * 1600, metric = 20.56% * 1600;\n",
      " Minibatch[ 101- 150]: loss = 0.402196 * 1600, metric = 12.94% * 1600;\n",
      " Minibatch[ 151- 200]: loss = 0.386619 * 1600, metric = 13.75% * 1600;\n",
      " Minibatch[ 201- 250]: loss = 0.328646 * 1600, metric = 9.19% * 1600;\n",
      " Minibatch[ 251- 300]: loss = 0.301831 * 1600, metric = 9.50% * 1600;\n",
      " Minibatch[ 301- 350]: loss = 0.299345 * 1600, metric = 9.44% * 1600;\n",
      " Minibatch[ 351- 400]: loss = 0.279577 * 1600, metric = 8.94% * 1600;\n",
      " Minibatch[ 401- 450]: loss = 0.281061 * 1600, metric = 8.25% * 1600;\n",
      " Minibatch[ 451- 500]: loss = 0.261366 * 1600, metric = 7.81% * 1600;\n",
      " Minibatch[ 501- 550]: loss = 0.244967 * 1600, metric = 7.12% * 1600;\n",
      " Minibatch[ 551- 600]: loss = 0.243953 * 1600, metric = 8.31% * 1600;\n",
      "Finished Epoch[1]: loss = 0.344399 * 20000, metric = 12.58% * 20000 2.384s (8389.3 samples/s);\n",
      "Finished Evaluation [2]: Minibatch[1-32]: metric = 8.11% * 1024;\n"
     ]
    }
   ],
   "source": [
    "# Recreate the model, so that we can start afresh. This is a direct copy from above.\n",
    "model_lr = cntk.layers.Dense(num_classes_lr, activation=None)\n",
    "@cntk.Function.with_signature(cntk.layers.Tensor[input_dim_lr], cntk.layers.SparseTensor[num_classes_lr])\n",
    "def criterion_lr(data, label_one_hot):\n",
    "    z = model_lr(data)  # apply model. Computes a non-normalized log probability for every output class.\n",
    "    loss = cntk.cross_entropy_with_softmax(z, label_one_hot) # this applies softmax to z under the hood\n",
    "    metric = cntk.classification_error(z, label_one_hot)\n",
    "    return loss, metric\n",
    "\n",
    "# Create the learner; same as above.\n",
    "learner = cntk.sgd(model_lr.parameters, cntk.learning_parameter_schedule(0.1))\n",
    "\n",
    "# This time we must create a Trainer instance ourselves.\n",
    "trainer = cntk.Trainer(None, criterion_lr, [learner], [cntk.logging.ProgressPrinter(50)])\n",
    "\n",
    "# Train the model by spoon-feeding minibatch by minibatch.\n",
    "minibatch_size = 32\n",
    "for i in range(0, len(X_train_lr), minibatch_size): # loop over minibatches\n",
    "    x = X_train_lr[i:i+minibatch_size] # get one minibatch worth of data\n",
    "    y = Y_train_lr[i:i+minibatch_size]\n",
    "    trainer.train_minibatch({criterion_lr.arguments[0]: x, criterion_lr.arguments[1]: y})  # update model from one minibatch\n",
    "trainer.summarize_training_progress()\n",
    "\n",
    "# Test error rate minibatch by minibatch\n",
    "evaluator = cntk.Evaluator(criterion_lr.outputs[1], [progress_writer]) # metric is the second output of criterion_lr()\n",
    "for i in range(0, len(X_test_lr), minibatch_size): # loop over minibatches\n",
    "    x = X_test_lr[i:i+minibatch_size] # get one minibatch worth of data\n",
    "    y = Y_test_lr[i:i+minibatch_size]\n",
    "    evaluator.test_minibatch({criterion_lr.arguments[0]: x, criterion_lr.arguments[1]: y})  # test one minibatch\n",
    "evaluator.summarize_test_progress()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training and Evaluating\n",
    "\n",
    "In our examples above, we use the `train()` function to train, and `test()` for evaluating.\n",
    "In this section, we want to walk you through the advanced options of `train()`:\n",
    "\n",
    " 1. **Distributed Training** on multiple GPUs using MPI.\n",
    " 2. Callbacks for **Progress Tracking**, **TensorBoard visualization**, **Checkpointing**,**Cross-validation**-based training contro, and **Testing** for the final model.\n",
    "\n",
    "### 1. Distributed Training\n",
    "\n",
    "CNTK makes distributed training easy. Out of the box, it supports three methods of distributed training:\n",
    "\n",
    " * Simple **data-parallel** training.\n",
    " * **1-bit SGD**.\n",
    " * **BlockMomentum**.\n",
    "\n",
    "Simple **data-parallel** training distributes each minibatch over N worker processes, where each process utilizes one GPU.\n",
    "After each minibatch, sub-minibatch gradients from all workers are aggregated before updating each model copy.\n",
    "This is often sufficient for convolutional networks, which have a high computation/communication ratio.\n",
    "\n",
    "**1-bit SGD** uses 1-bit data compression with residual feedback to speed up data-parallel training\n",
    "by reducing the data exchanges to 1 bit per gradient value.\n",
    "To avoid affecting convergence, each worker keeps a quantization-error residual which is added to the next minibatch's\n",
    "gradient. This way, all gradient values are eventually transmitted with full accuracy, albeit at a delay.\n",
    "This method has been found effective for networks where communication cost becomes the dominating factor,\n",
    "such as full-connected networks and some recurrent ones.\n",
    "This method has been found to only minimally degrade accuracy at good speed-ups.\n",
    "\n",
    "**BlockMomentum** improves communication bandwidth by exchanging gradients only every N minibatches.\n",
    "To avoid affecting convergence, BlockMomentum combines \"model averaging\" with the residual technique of 1-bit SGD:\n",
    "After N minibatches, block gradients are aggregated across workers, and added to all model copies at weight of 1/N,\n",
    "while a residual keeps (N-1)/N times the block gradient, which is added to the next block gradient, which\n",
    "then is in turn applied at a weight of 1/N and so on.\n",
    "\n",
    "Processes are started with and communicate through MPI. Hence, CNTK's distributed training\n",
    "works both within a single server and across multiple servers.\n",
    "All you need to do is\n",
    "\n",
    " * wrap your learner inside a `distributed_learner` object\n",
    " * execute the Python script using `mpiexec`\n",
    "\n",
    "Please see the example below when we put all together."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Callbacks\n",
    "\n",
    "The `callbacks` parameter of `train()` specifies actions that the `train()` function\n",
    "executes periodically, typically every epoch.\n",
    "The `callbacks` parameter is a list of objects, where the object type decides the specific callback action.\n",
    "\n",
    "Progress trackers allow to log progress (average loss and metric)\n",
    "periodically after N minibatches and after completing each epoch.\n",
    "Optionally, all of the first few minibatches can be logged.\n",
    "The `ProgressPrinter` callback logs to stderr and file, while `TensorBoardProgressWriter`\n",
    "logs events for visualization in TensorBoard.\n",
    "You can also write your own progress tracker class.\n",
    "\n",
    "Next, the `CheckpointConfig` class denotes a callback that writes a checkpoint file every epoch, and automatically restarts training at the latest available checkpoint.\n",
    "\n",
    "The `CrossValidationConfig` class tells CNTK to periodically evaluate the model on a cross-validation data set,\n",
    "and then call a user-specified callback function, which can then update the learning rate of return `False` to indicate early stopping.\n",
    "\n",
    "Lastly, `TestConfig` instructs CNTK to evaluate the model at the end on a given test set.\n",
    "This is the same as the explicit `test()` call in our examples above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Putting it all Together: Advanced Training Example\n",
    "\n",
    "Let us now put all of the above examples together into a single training. The following example runs our MNIST example from above with logging, TensorBoard events, checkpointing, CV-based training control, and a final test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Redirecting log to file my.log\n"
     ]
    }
   ],
   "source": [
    "# Create model and criterion function.\n",
    "model_mn = create_model_mn()\n",
    "@cntk.Function.with_signature(cntk.layers.Tensor[input_shape_mn], cntk.layers.SparseTensor[num_classes_mn])\n",
    "def criterion_mn(data, label_one_hot):\n",
    "    z = model_mn(data)\n",
    "    loss = cntk.cross_entropy_with_softmax(z, label_one_hot)\n",
    "    metric = cntk.classification_error(z, label_one_hot)\n",
    "    return loss, metric\n",
    "\n",
    "# Create the learner.\n",
    "learner = cntk.learners.momentum_sgd(model_mn.parameters, lrs, momentums)\n",
    "\n",
    "# Wrap learner in a distributed learner for 1-bit SGD.\n",
    "# In this example, distributed training kicks in after a warm-start period of one epoch.\n",
    "learner = cntk.train.distributed.data_parallel_distributed_learner(learner, distributed_after=1, num_quantization_bits=1)\n",
    "\n",
    "# Create progress callbacks for logging to file and TensorBoard event log.\n",
    "# Prints statistics for the first 10 minibatches, then for every 50th, to a log file.\n",
    "progress_writer = cntk.logging.ProgressPrinter(50, first=10, log_to_file='my.log')\n",
    "tensorboard_writer = cntk.logging.TensorBoardProgressWriter(50, log_dir='my_tensorboard_logdir',\n",
    "                                 rank=cntk.train.distributed.Communicator.rank(), model=criterion_mn)\n",
    "\n",
    "# Create a checkpoint callback.\n",
    "# Set restore=True to restart from available checkpoints.\n",
    "epoch_size = len(X_train_mn)\n",
    "checkpoint_callback_config = cntk.CheckpointConfig('model_mn.cmf', epoch_size, preserve_all=True, restore=False)\n",
    "\n",
    "# Create a cross-validation based training control.\n",
    "# This callback function halves the learning rate each time the cross-validation metric\n",
    "# improved less than 5% relative, and stops after 6 adjustments.\n",
    "prev_metric = 1 # metric from previous call to the callback. Error=100% at start.\n",
    "def adjust_lr_callback(index, average_error, cv_num_samples, cv_num_minibatches):\n",
    "    global prev_metric\n",
    "    if (prev_metric - average_error) / prev_metric < 0.05: # did metric improve by at least 5% rel?\n",
    "        learner.reset_learning_rate(cntk.learning_parameter_schedule_per_sample(learner.learning_rate() / 2))\n",
    "        if learner.learning_rate() < lrs[0] / (2**7-0.1): # we are done after the 6-th LR cut\n",
    "            print(\"Learning rate {} too small. Training complete.\".format(learner.learning_rate()))\n",
    "            return False # means we are done\n",
    "        print(\"Improvement of metric from {:.3f} to {:.3f} insufficient. Halving learning rate to {}.\".format(prev_metric, average_error, learner.learning_rate()))\n",
    "    prev_metric = average_error\n",
    "    return True # means continue\n",
    "\n",
    "cv_callback_config = cntk.CrossValidationConfig((X_cv_mn, Y_cv_mn), 3*epoch_size, minibatch_size=256,\n",
    "                                                callback=adjust_lr_callback, criterion=criterion_mn)\n",
    "\n",
    "# Callback for testing the final model.\n",
    "test_callback_config = cntk.TestConfig((X_test_mn, Y_test_mn), criterion=criterion_mn)\n",
    "\n",
    "# Train!\n",
    "callbacks = [progress_writer, tensorboard_writer, checkpoint_callback_config, cv_callback_config, test_callback_config]\n",
    "progress = criterion_mn.train((X_train_mn, Y_train_mn), minibatch_size=minibatch_sizes,\n",
    "                              max_epochs=50, parameter_learners=[learner], callbacks=callbacks)\n",
    "\n",
    "# Progress is available from return value\n",
    "losses = [summ.loss for summ in progress.epoch_summaries]\n",
    "print('loss progression =', \", \".join([\"{:.3f}\".format(loss) for loss in losses]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unfortunately, MPI cannot be used from a Jupyter notebook; hence, the `distributed_learner` above actually has no effect.\n",
    "You can find the same example\n",
    "as a standalone Python script under `Examples/1stSteps/MNIST_Complex_Training.py` to run under MPI, for example under MSMPI as\n",
    "\n",
    "`mpiexec -n 4 -lines python -u Examples/1stSteps/MNIST_Complex_Training.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploying your Model\n",
    "\n",
    "Your ultimate purpose of training a deep neural network is to deploy it as part of your own program or product.\n",
    "Since this involves programming languages other than Python,\n",
    "we will only give a high-level overview here, and refer you to specific examples.\n",
    "\n",
    "Once you completed training your model, it can be deployed in a number of ways.\n",
    "\n",
    " * Directly in your **Python** program.\n",
    " * From any other language that CNTK supports, including **C++** and **C#**.\n",
    " * From **your own web serive**.\n",
    " * Through a web service deployed to **Microsoft Azure**.\n",
    "\n",
    "The first step in all cases is to make sure your model's input types are known by calling `update_signature()`, and then to save your model to disk after training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model_mn.update_signature(cntk.layers.Tensor[input_shape_mn])\n",
    "model_mn.save('mnist.cmf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Deploying your model in a Python-based program is easy: Since networks are function objects that are callable, like a function, simply load the model, and call it with inputs, as we have already shown above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# At program start, load the model.\n",
    "classify_digit = cntk.Function.load('mnist.cmf')\n",
    "\n",
    "# To apply model, just call it.\n",
    "image_input = X_test_mn[8345]        # (pick a random test digit for illustration)\n",
    "scores = classify_digit(image_input) # call the model function with the input data\n",
    "image_class = scores.argmax()        # find the highest-scoring class\n",
    "\n",
    "# And that's it. Let's have a peek at the result\n",
    "print('Recognized as:', image_class)\n",
    "matplotlib.pyplot.axis('off')\n",
    "_ = matplotlib.pyplot.imshow(image_input, cmap=\"gray_r\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Models can be deployed directly from programs written in other programming languages for which bindings exist.\n",
    "Please see the following example programs for an example similar to the Python one above:\n",
    "\n",
    " * C++: `Examples/Evaluation/CNTKLibraryCPPEvalCPUOnlyExamples/CNTKLibraryCPPEvalCPUOnlyExamples.cpp`\n",
    " * C#: `Examples/Evaluation/CNTKLibraryCSEvalCPUOnlyExamples/CNTKLibraryCSEvalExamples.cs`\n",
    "\n",
    "To deploy a model from your own web service, load and invoke the model in the same way.\n",
    "\n",
    "To deploy a model via an Azure web service, follow this tutorial: `Examples/Evaluation/CNTKAzureTutorial01`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "This tutorial provided an overview of the five main tasks of creating and using a deep neural network with CNTK.\n",
    "\n",
    "We first examined CNTK's Functional programming and its tensor/sequence-based data model.\n",
    "Then we considered the possible ways of feeding data to CNTK, including directly from RAM,\n",
    "through CNTK's data-reading infrastructure (`MinibatchSource`), and spoon-feeding through a custom minibatch loop.\n",
    "We then took a look at CNTK's advanced training options, including distributed training, logging to TensorBoard, checkpointing, CV-based training control, and final model evaluation.\n",
    "Lastly, we briefly looked into model deployment.\n",
    "\n",
    "We hope this guided your have you a good starting point for your own ventures with CNTK. Please enjoy!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
